Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Book Layout Analysis: TOC Structure Extraction Engine

Identifieur interne : 000A05 ( Main/Exploration ); précédent : 000A04; suivant : 000A06

Book Layout Analysis: TOC Structure Extraction Engine

Auteurs : Bodin Dresevic [Serbie] ; Aleksandar Uzelac [Serbie] ; Bogdan Radakovic [Serbie] ; Nikola Todic [Serbie]

Source :

RBID : ISTEX:3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104

Abstract

Abstract: Scanned then OCRed documents usually lack detailed layout and structural information. We present a book specific layout analysis system used to extract TOC structure information from the scanned and OCRed books. This system was used for navigation purposes by the live books search project. We provide labeling scheme for the TOC sections of the books, high level overview for the book layout analysis system, as well as TOC Structure Extraction Engine. In the end we present accuracy measurements of this system on a representative test set.

Url:
DOI: 10.1007/978-3-642-03761-0_17


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Book Layout Analysis: TOC Structure Extraction Engine</title>
<author>
<name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
</author>
<author>
<name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
</author>
<author>
<name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
</author>
<author>
<name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-03761-0_17</idno>
<idno type="url">https://api.istex.fr/document/3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001C94</idno>
<idno type="wicri:Area/Istex/Curation">001B75</idno>
<idno type="wicri:Area/Istex/Checkpoint">000527</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Dresevic B:book:layout:analysis</idno>
<idno type="wicri:Area/Main/Merge">000A13</idno>
<idno type="wicri:Area/Main/Curation">000A05</idno>
<idno type="wicri:Area/Main/Exploration">000A05</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Book Layout Analysis: TOC Structure Extraction Engine</title>
<author>
<name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Serbie</country>
<wicri:regionArea>Microsoft Development Center Serbia, Makedonska 30, 11000, Belgrade</wicri:regionArea>
<wicri:noRegion>Belgrade</wicri:noRegion>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: bodind@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Serbie</country>
<wicri:regionArea>Microsoft Development Center Serbia, Makedonska 30, 11000, Belgrade</wicri:regionArea>
<wicri:noRegion>Belgrade</wicri:noRegion>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: aleksandar.uzelac@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Serbie</country>
<wicri:regionArea>Microsoft Development Center Serbia, Makedonska 30, 11000, Belgrade</wicri:regionArea>
<wicri:noRegion>Belgrade</wicri:noRegion>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: bogdan.radakovic@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Serbie</country>
<wicri:regionArea>Microsoft Development Center Serbia, Makedonska 30, 11000, Belgrade</wicri:regionArea>
<wicri:noRegion>Belgrade</wicri:noRegion>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: nikola.todic@microsoft.com</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104</idno>
<idno type="DOI">10.1007/978-3-642-03761-0_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Scanned then OCRed documents usually lack detailed layout and structural information. We present a book specific layout analysis system used to extract TOC structure information from the scanned and OCRed books. This system was used for navigation purposes by the live books search project. We provide labeling scheme for the TOC sections of the books, high level overview for the book layout analysis system, as well as TOC Structure Extraction Engine. In the end we present accuracy measurements of this system on a representative test set.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Serbie</li>
</country>
</list>
<tree>
<country name="Serbie">
<noRegion>
<name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
</noRegion>
<name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
<name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
<name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A05 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A05 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104
   |texte=   Book Layout Analysis: TOC Structure Extraction Engine
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024